<!DOCTYPE html>
<html lang="zh-cn">
<head>
   
    <link type="text/css" rel="stylesheet" href="/bundles/blog-common.css?v=KOZafwuaDasEedEenI5aTy8aXH0epbm6VUJ0v3vsT_Q1"/>
<link id="MainCss" type="text/css" rel="stylesheet" href="/skins/ThinkInside/bundle-ThinkInside.css?v=RRjf6pEarGnbXZ86qxNycPfQivwSKWRa4heYLB15rVE1"/>
<link type="text/css" rel="stylesheet" href="/blog/customcss/428549.css?v=%2fam3bBTkW5NBWhBE%2fD0lcyJv5UM%3d"/>

</head>
<body>
<a name="top"></a>

<div id="page_begin_html"></div><script>load_page_begin_html();</script>

<div id="topics">
	<div class = "post">
		<h1 class = "postTitle">
			<a id="cb_post_title_url" class="postTitle2" href="https://www.cnblogs.com/frankdeng/p/9092512.html">开发工具之Spark程序开发详解</a>
		</h1>
		<div class="clear"></div>
		<div class="postBody">
			<div id="cnblogs_post_body" class="blogpost-body"><h2>一&nbsp; 使用IDEA开发Spark程序</h2>
<h3>1、打开IDEA的官网地址，地址如下：http://www.jetbrains.com/idea/</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115726835-1561376144.png" alt="" /></p>
<h3>2、点击DOWNLOAD，按照自己的需求下载安装，我们用免费版即可。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115823070-527459524.png" alt="" /></p>
<h3>3、双击ideaIU-15.0.2.exe安装包，点击Next。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115845948-1027875370.png" alt="" /></p>
<h3>4、选择安装路径，点击Next。</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115907802-417905064.png" alt="" /></p>
<h3>5、可以选择是否创建桌面快捷方式，然后点击Next。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115921582-517515557.png" alt="" /></p>
<h3>6、点击Install。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115941319-1099006189.png" alt="" /></p>
<h3>7、安装过程</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510115953824-1488520928.png" alt="" /></p>
<h3>8、点击Finish，安装成功</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120006595-1659453269.png" alt="" /></p>
<h3>9、双击IntelliJ IDEA 15.0.2的图标，打开IntelliJ IDEA。</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120025739-573555376.png" alt="" /></p>
<h3>10、可以导入自己的设置，没有就选择下面的即可，然后点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120039094-986780014.png" alt="" /></p>
<h3>11、选择自己喜欢的风格</h3>
<p>&nbsp;&nbsp;(1)&nbsp;风格1</p>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120050459-439373217.png" alt="" /></p>
<p>(2)&nbsp;风格2</p>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120123887-932784379.png" alt="" />&nbsp;</p>
<h3>12、选择完风格后，点击Next Default plugins</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120137719-1998940573.png" alt="" /></p>
<h3>13、点击Next Featured plugins</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120213149-578231917.png" alt="" /></p>
<h3>14、点击Scala Custom Languages&nbsp;下面的Install</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120231958-643981598.png" alt="" /></p>
<h3>15、安装过程</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120254435-1105123220.png" alt="" /></p>
<h3>16、显示Installed就代表安装成功了，然后点击Start using IntelliJ IDEA。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120311441-162423728.png" alt="" /></h3>
<h3>17、点击Create New Project，创建新工程。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120323048-1314360682.png" alt="" /></p>
<h3>18、选择Scala，点击Next。</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120341762-765744024.png" alt="" /></p>
<h3>19、填写Project name和Project location。</h3>
<p><img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120400450-129978413.png" alt="" /></p>
<h3>20、设置Project SDK，点击New。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510120424904-1850898468.png" alt="" /></p>
<h3>21、点击New打开的小窗口里点击JDK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123756904-252972223.png" alt="" /></p>
<h3>22、选择安装JDK的路径，点击OK</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123756838-753284936.png" alt="" /></p>
<h3>23、Project SDK会变成如下面图所示，是你安装的JDK版本</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123813150-809448149.png" alt="" /></p>
<h3>24、设置Scala SDK，点击Create。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123840324-856372686.png" alt="" /></h3>
<h3>25、选择这台机器安装的2.10.x版本，然后点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123850555-1340076000.png" alt="" /></p>
<h3>26、然后就变成如图所示，然后点击Finish。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123904846-677969103.png" alt="" /></p>
<h3>27、出现这个提示，直接点击OK。</h3>
<p>&nbsp;&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123914881-817206856.png" alt="" /></p>
<h3>28、出现这个窗口，把Show Tips on Startup勾掉，点击Close即可。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123924508-482476498.png" alt="" /></p>
<h3>29、项目创建成功以后的目录如下：</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123933373-2076021158.png" alt="" /></p>
<h3>30、下载<a href="http://www.apache.org/dyn/closer.lua/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz">spark-1.6.0-bin-hadoop2.6.tgz</a>，解压spark-1.6.0-bin-hadoop2.6.tgz，解压以后目录如下：</h3>
<p align="justify">&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123948031-682227623.png" alt="" /></p>
<h3>31、添加Spark的jar依赖，File-&gt; Project Structure -&gt; Libraries，点击号，选择Java。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510123957730-1634561993.png" alt="" /></p>
<h3>32、进入到解压以后的spark-1.6.0-bin-hadoop2.6的lib目录下，选择spark-assembly-1.6.0-hadoop2.6.0.jar，如下图所示，然后点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124008691-912038691.png" alt="" /></p>
<h3>33、点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124018049-1507078776.png" alt="" /></p>
<h3>34、如下图所示，然后点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124028869-296241692.png" alt="" /></p>
<h3>35、项目会变成如下图所示。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124039719-1797748202.png" alt="" /></p>
<h3>36、右击src -&gt; New -&gt; Package。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124048859-2020573709.png" alt="" /></p>
<h3>37、填写好包名，点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124059279-1017486230.png" alt="" /></p>
<h3>38、右击com.dt.spark -&gt; New -&gt; Scala Class。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124109487-710788558.png" alt="" /></p>
<h3>39、Name填写WordCount，Kind里选择Object，点击OK。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124121171-1145265304.png" alt="" /></p>
<h3>40、WordCount里添加main方法，如下图。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124132060-1554174009.png" alt="" /></p>
<h3>41、开始编写Spark WordCount项目，创建SparkConf，设置conf的参数，设置应用程序名称，使用local模式执行，图里的第1步。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124140854-1187451528.png" alt="" /></p>
<h3>42、创建SparkContext对象，图里第2步。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124150091-1985005282.png" alt="" /></p>
<h3>43、读取本地文件，图里的第3步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124201356-759388390.png" alt="" /></h3>
<h3>44、将每一行的字符串拆分成单个的单词，图里的第4.1步。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124211968-864785503.png" alt="" /></p>
<h3>45、在单词拆分的基础上对每个单词实例计数为1，也就是word =&gt; (word, 1)，图里4.2步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124223877-1777547845.png" alt="" /></h3>
<h3>46、每个单词实例计数为1的基础之上统计每个单词在文件中出现的总次数，图里4.3步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124233810-1925435489.png" alt="" /></h3>
<h3>47、打印计算结果，图里的第5步。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124244808-1069741568.png" alt="" /></p>
<h3>48、关闭SparkContext，图里的第6步。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124256979-1366842175.png" alt="" /></p>
<h3>49、运行开发的项目，右击WorkCount.scala文件&nbsp;-&gt; Run&nbsp;&lsquo;Word Count&rsquo;。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124306977-1115408543.png" alt="" /></p>
<h3>50、看见这样的结果，就代表成功了。</h3>
<p>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510124320396-1633716961.png" alt="" /></p>
<h2>二&nbsp; 使用Scala IDE 开发Spark程序</h2>
<h3>1、打开Scala IDE for Eclipse的官网，官网地址：http://scala-ide.org/</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133436151-1421599177.png" alt="" /></h3>
<h3>2、点击Download IDE。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133441877-305868083.png" alt="" /></h3>
<h3>3、下载对应的版本。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133451131-165450401.png" alt="" /></h3>
<h3>4、scala-SDK-4.3.0-vfinal-2.11-win32.win32.x86_64.zip为例，解压缩。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133457150-677297809.png" alt="" /></h3>
<h3>5、双击打开eclipse.exe。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133504410-580566445.png" alt="" /></h3>
<h3>6、选择一个工作目录，然后点击OK。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133530677-385111231.png" alt="" /></h3>
<h3>7、在打开的窗口中，File -&gt; New -&gt; Scala Project。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133537420-686399731.png" alt="" /></h3>
<h3>8、写好Project name，点击Next。</h3>
<h3>&nbsp;&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133544826-705862632.png" alt="" /></h3>
<h3>9、点击Finish。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133552239-915426344.png" alt="" /></h3>
<h3>10、修改JRE System Library。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133559297-633293321.png" alt="" /></h3>
<h3>11、右击JRE System Library -&gt; Build Path -&gt; Configure Build Path...。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133629974-801144976.png" alt="" /></h3>
<h3>12、点击JRE System Library -&gt; Edit。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133636609-1843883767.png" alt="" /></h3>
<h3>13、选择Alternate JRE -&gt; Installed JREs...。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133643803-1623980651.png" alt="" /></h3>
<h3>14、点击Add...。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133649839-1457999055.png" alt="" /></h3>
<h3>15、选择Standard VM，点击Next。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133703021-1629297277.png" alt="" /></h3>
<h3>16、点击Directory...，选择本地文件安装JDK的安装目录，点击Finish。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133709340-1972060530.png" alt="" /></h3>
<h3>17、选择刚才加入的JDK，点击OK。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133719013-620572246.png" alt="" /></h3>
<h3>18、下拉列表里选择刚才加入的JDK，点击Finish。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133725954-1307852902.png" alt="" /></h3>
<h3>19、点击OK。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133733413-1757222653.png" alt="" /></h3>
<h3>20、设置Scala library container。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133740028-894949852.png" alt="" /></h3>
<h3>21、项目上有右击&nbsp;-&gt; Properties。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133749117-886631321.png" alt="" /></h3>
<h3>22、打开的窗口点击Scala Compiler。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133755827-997786776.png" alt="" /></h3>
<h3>23、Use Project Settings打钩，打开Scala Installation下拉列表，选择Latest 2.10 bundle(dynamic)，点击OK。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133803494-1076349784.png" alt="" /></h3>
<h3>24、点击OK。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133844152-2041084201.png" alt="" /></h3>
<h3>25、下载<a href="http://www.apache.org/dyn/closer.lua/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz">spark-1.6.0-bin-hadoop2.6.tgz</a>，解压spark-1.6.0-bin-hadoop2.6.tgz，解压以后目录如下：</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133853042-813035135.png" alt="" /></h3>
<h3>26、添加Spark的jar依赖，项目右击&nbsp;-&gt; Build Path -&gt; Configure Build Path...。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133933867-1536365401.png" alt="" /></h3>
<h3>27、点击Libraries -&gt; Add External JARs...。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133941788-707820451.png" alt="" /></h3>
<h3>28、选择lib目录下的spark-assembly-1.6.0-hadoop2.6.0.jar文件，点击打开。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133949900-1513399643.png" alt="" /></h3>
<h3>29、点击OK。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510133958782-171411176.png" alt="" /></h3>
<h3>30、项目里创建包，右击src -&gt; New -&gt; Package。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134007645-1562415061.png" alt="" />&nbsp;</h3>
<h3>31、填写好Name，点击Finish。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134014979-1437711684.png" alt="" /></h3>
<h3>32、创建Scala Object，右击com.dt.spark -&gt; New -&gt; Scala Object。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134022480-2009433919.png" alt="" /></h3>
<h3>33、填写好Name，点击Finish。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134032081-1374791508.png" alt="" /></h3>
<h3>34、开始编写WordCount，写Title。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134039319-2128155272.png" alt="" /></h3>
<h3>35、添加main方法。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134047930-1897952195.png" alt="" /></h3>
<h3>36、创建SparkConf对象，图里的第1步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134054969-2073091133.png" alt="" /></h3>
<h3>37、创建SparkContext对象，图里的第2步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134105335-519754506.png" alt="" /></h3>
<h3>38、读取本地文件，图里的第3步</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134115877-947910423.png" alt="" /></h3>
<h3>39、将每一行的字符串拆分成单个的单词，图里的第4.1步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134122321-1290083692.png" alt="" /></h3>
<h3>40、在单词拆分的基础上对每个单词实例计数为1，也就是word =&gt; (word, 1)，图里4.2步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134130181-516666114.png" alt="" /></h3>
<h3>41、每个单词实例计数为1的基础之上统计每个单词在文件中出现的总次数，图里4.3步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134135706-1724388401.png" alt="" /></h3>
<h3>42、打印计算结果，图里的第5步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134144606-1182359918.png" alt="" /></h3>
<h3>43、关闭SparkContext，图里的第6步。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134150442-82119195.png" alt="" /></h3>
<h3>44、运行项目，右击WorkCount.scala文件&nbsp;-&gt; Run As -&gt; Scala Application。</h3>
<h3>&nbsp;&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134200216-1567703636.png" alt="" /></h3>
<h3>45、看见这样的结果，就代表成功了。</h3>
<h3>&nbsp;<img src="https://images2018.cnblogs.com/blog/1385722/201805/1385722-20180510134207326-721409897.png" alt="" /></h3></div><div id="MySignature"></div>
<div class="clear"></div>
<div id="blog_post_info_block">
<div id="BlogPostCategory"></div>
<div id="EntryTag"></div>
<div id="blog_post_info">
</div>

</body>
</html>
